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We study the relation between the acquisition and analysis of data and quantum theory using 
a probabilistic and deterministic model for photon polarizers. We introduce criteria for efficient 
processing of data and then use these criteria to demonstrate that efficient processing of the data 
contained in single events is equivalent to the observation that Malus' law holds. A strictly deter- 
ministic process that also yields Malus' law is analyzed in detail. We present a performance analysis 
of the probabilistic and deterministic model of the photon polarizer. The latter is an adaptive dy- 
namical system that has primitive learning capabilities. This additional feature has recently been 
shown to be sufficient to perform event-by-event simulations of interference phenomena, without us- 
ing concepts of wave mechanics. We illustrate this by presenting results for a system of two chained 
Mach-Zehnder interferometers, suggesting that systems that perform efficient data processing and 
have learning capability are able to exhibit behavior that is usually attributed to quantum systems 
only. 
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I. INTRODUCTION 

Consider the schematic representation of an experi- 
ment in which a source emits objects that carry infor- 
mation represented by an angle < ip < 2ir (see Fig.^l. 
We want to determine this angle as accurately as pos- 
sible, but there are some limitations on the equipmcnt 
that is available to us, namely: 

• We do not have a device that can measure the angle 
ip directly. 

• We have detectors that can count the arrival (or 
passage) of individual objects. 

• We can build a device, called processor in what 
follows, that can direct an incoming object to one 
of its two output channels according to %p, relative 
to the orientation < < 2ir of the processor 
itself. We can count the number of objects in each 
output channel by using the detectors. 

• We do not have prior information about the angle 
ip itself, implying that there is no reason to assumc 
that particular angles tp are more likely to occur 
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FIG. 1: Schematic representation of the event-by-event ex- 
periment. The source emits objects one at a time. Each 
object carries a message represented by an angle ip. The pro- 
cessor combines this message with the angle <j> that is con- 
trolled by the user, and sends the object through one of its 
two output channels. Detectors count the number of objects 
in each channel. 



than others. Note that this does not imply that ip 
is a random variable. 

Given this scenario, the obvious question is: What kind 
of functionality should we build into the processor such 
that by counting TV events in the output channels, we 
obtain an accurate estimate for the angle ip ? Of course, 
the optimal design of the processor depends on additional 
constraints. Usually, we prefer devices of which the out- 
put does not change drastically when the input varies a 
little. Therefore we require that 

1. The number of events generated in each output 
channel is most insensitive to small changes in 



e = ip-(j). 

2. The performance of the processor should be insen- 
sitive to the actual value of 9. 

Using these two criteria, we consider two extreme re- 
alizations of the processor. First, we construct a sim- 
ple probabilistic processor that operates according to the 
rules of probability theory and uses random numbers to 
transform the input data ij) into a sequence of discrete 
output events. Then, we present a strictly determinis- 
tic processor that performs the same task as the simple 
probabilistic processor. In both cases, the general strat- 
egy to determine the optimal processor is the same: We 
scarch for a probabilistic or deterministic process that 
satisfies the criteria (1) and (2) mentioned earlier. Then 
we cstimatc the cmcicncy of the processor. As convenient 
measure for the efficiency (or performance) of a proces- 
sor, we take the number of different messages Mp that 
can be extracted from a record of N bits, each bit repre- 
senting an event in one of the two output channels, with 
a specified level of certainty. 

The reader may have noticed that the scenario we de- 
scribed earlier applies to the measurement of the polar- 
ization of light in the regime where the signal from the de- 
tectors consists of discrete "clicks" 0,0. In this case, the 
objects are represented by photons, the angle ip describes 
the polarization (which we cannot measure directly), the 
detectors may be photon multiplier tubes or semiconduc- 
tor diodes, and the processor a properly prepared calcite 
crystal In general terms, we want to characterize 
the behavior of a system in terms of numerical quanti- 
ties that we can obtain by repeating measurements that 
give us partial information only. This is a characteris- 
tic feature of quantum mechanics. In order not to be- 
come entangled in the difficulties with the interpretation 
of quantum theory and the measurement paradox in par- 
ticular Q, in the theoretical analysis presented in this 
paper, we avoid the use of words such as photons and 
polarization. A remarkable result of this paper is that 
the search for an efhcient (in the sense specified earlier) , 
data processor yields probabilistic and deterministic pro- 
cessors that generate output events according to Malus' 
law. 



II. PROBABILISTIC PROCESSOR 

The schematic diagram of the probabilistic processor 
is shown in Fig. [3 The input to the processor is an 
event that carries a message represented by the angle 
ip. The presence of an event in one of the output chan- 
nels is represented by a message that carries the variable 
x = ±1. We assume that the experimental data is in con- 
cert with the hypothcsis of rotational invariance. That 
is, the number of events in the x — +1 and x = — 1 chan- 
nels only depend on the difference 9 = ip — cj> between 
the (unknown) angle ip and the orientation of the device 
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FIG. 2: Schematic diagram of a probabilistic processor that 
transforms the difference between the input angle tj) and the 
setting 4> into a sequence of x = ±1 signals. 



< 4> < 27r. Furthermore, the number of output events 
should be periòdic in 9 with a period of tt 

Let the probability p(x\9) describe the process that 
transforms each input event into an output event x — 
±1. By symmetry we have p{x\9) = p(x\9 + 7r) and if 
we assume that each input event generates exactly one 
output event we have 



J2 P(x\9)=p(+l\9)+p(-l\9) = l. 



(1) 



x=±l 



We also assume that there is no logical dependence be- 
tween two output events i and j, that is p(xi,Xj\9) = 
p(xi\9)p(xj\9) for all i ^ j. This implies that the cor- 
relation between the output events is zero. Then, this 
process generates Bernoulli trials d, Q, ■ 

Under these conditions, all information about the po- 
larization is encoded in the measurable quantity 

f{9) = (x)=J2 *P(s|0) = 2p(+l|0) - 1. (2) 

x=±l 

From Eq. it is clear that we can completely charac- 
terize the process by p{9) = p(+l\9), 

Our task is to design the processor, that is to determine 
the function p(9), such that a measurement of f{9) gives 
us as much as possible knowledge about the unknown 
angle tp. 

Let us consider the data as collected by an observer 
who decides to record and analyze data sets of N objects 
each. We assume that 9 is fixed during this measurement. 
Each data set looks like {x\, . . . , xn} where Xi = ±1 for 
i = 1, . . . , N. Let us assume that the number of Xi = 
+1 events in a particular data set is n. Recali that the 
processor generates Bernoulli trials 0, 0, Q ■ Therefore, 
the probability for observing this data set is given by 0, 
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For convenience of the reader, we first recali some well 
known facts of probability theory [E 0> 13 • From Eq. 
it follows that, as a function of p(0), P(n\6,N) reaches 
its maximum P(n\9, N) at p(9) = n/N. A simple calcu- 
lation shows that 

1 P(n\6,N) 1-P{9) 

# P(n|0,iV) p(#) I-PW 

(4) 

For small vàlues of \p(6) — p(9)\, the Taylor series expan- 
sion of the left hand side of Eq. yields 
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showing that as a function of p(9), P(n\9, N) vanishes ex- 
ponentially fast with N, unless p(9) = p(9) = n/N 0,0- 
Therefore, from the point of view of the observer, the 
procedure is simple: As the observer knows (the yet 
unknown) function p(6), after measuring a data set 
{xi, . . . , xn}, the observer finds 9 by solving p(9) — 

n/N = (l + N^ 1 EíLi x i) I 2 - The total number n of 
Xi = +1 cvents contains all the available information 
about the difference 9 = ip — <f). 

A rough estimate for the number of distinguishable 
messages Md that the probabilistic processor can encode 
with an error of approximately one percent can be ob- 
tained as follows. First, we use Eq. © to calculate the 
variance on n and find that a 2 {9) = Np(9)(l —p(9)). For 
sufficiently large but fixed N, the probability distribu- 
tion Eq. © tends to the normal (Gaussian) distribution 
with mean n — Np{9) and variance a(9). Therefore, the 
probability to observe m (1 <C m <C N) instead of n 
(1 <C n <C N) Xi = +1 events is approximately given by 



P(m\6, N) 
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exp 
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2a 2 (0) 
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From the properties of the normal distribution, it follows 
that the probability for the observed number m of x% — 
+1 events to lie in the interval [n — 3a(9), n + Sa (9)] is 
larger than 0.997. Thus, the number of messages Md 
that can be encoded with a probability of error that is 
less that one percent is given by 
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(7) 



Although this is a rough estimate, the result that the 
number of distinguishable messages is of the order of \^N 
is to be expected on general grounds, given the constraint 
that the processor generates probabilistic, Bernoulli-likc 
events. 

We now apply the criteria 1 and 2 of Section^to opti- 
mize the design, that is we want to determine p{9) by 
which the probabilistic (Bernoulli) processor will gen- 
erate the output events. In the foregoing analysis, we 
assumed that 9 was fixed during the observation of N 



events. Clearly, this is not a realistic assumption. In a 
real experiment, (f> or ip fluctuates. Therefore, the best 
we can do is search for the probability p(9) that is least 
sensitive to small changes in <j> — ij}. This is criterion 1 of 
SectionU 

We determine this probability by considering the like- 
lihood that the observed sequence of Xi's was generated 
by p(x\9 + e) instead of p(x\9) where e is a small posi- 
tive number. The larger this likelihood, the larger the 
probability that the observer draws the wrong conclu- 
sion from the data. The log-likelihood L that the data 
was generated by p(x\9 + e) instead of by p{x\9) is given 
by0i 



L 1 P{n\9 + €,N) 
N ~ N n P(n\9,N) : 

= ^In p(fl + C) +(1- 
N p{9) v 
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(8) 



According to criterion 1 of Sectionnj we have to find the 
probability p{9) that minimizes \L\. 

We first consider the case that P(n\9 + e,N) > 
P(n\9,N). Then, because P(n\9,N) is not the maxi- 
mum, we assign p(9 + e) = n/N as the most likely guess. 
A Taylor series expansion of Eq. © yields 



P(n\9 + e,N) _ e 2 f dp{9) Y 

11 P{n\9,N) ~ 2p{9){\ -p{9)) \ 89 ) 



(9) 



Second, we consider the case that P(n\9 + e,N) < 
P(n\9,N). Now, adopting the same reasoning as used 
previously, the observer assigns p(9) = n/N and the Tay- 
lor series expansion of Eq. (JSJ yields 



ln 
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(10) 



As e was arbitrary (but small), minimization of \L\ is 
equivalent to minimizing the Fisher information [9j, llOl 

EU 



1 fdp(9)Y 
p(9)(l-p(9)) \ 89 ) 



(11) 



for this particular problem. Thus, we conclude that the 
first criterion telis us that we should minimize the Fisher 
information Ip . Substituting p(9) — cos 2 g(9) we obtain 



dg{9) 



89 



(12) 



Criterion 2 of Section^stipulates that the reliability of 
the procedure to extract (f> from the observed sequence 
of x^s should not depend on ip — c/>. We can realize this by 
choosing g(9) = a9 + b. Using the side information that 
p(+l\9) = p(+l\9+n) we find that p(+l|0) = cos 2 (k9+b) 
and If — 4fc 2 for k ^ (k — is excluded because then 
p(9) does not depend on 9 and the design leads to a 
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useless device). Clearly Ip is minimal if k = 1 and we 
may absorb the irrelevant phase factor b in <f>. 

In summary, using the two design criteria of Section[ïl 
we find that for optimal operation (from the point of view 
of the observer), the processor should use the probabili- 
ties 

p(-l\9) = sin 2 (9 , p(+l|0) = cos 2 6», (13) 

to generate the —1 and +1 events, respectively. Put dif- 
ferently, for a fixed processor setting <p and N incoming 
events with message ip, the observer will (in general) get 
most out of the data if the processor sends N cos 2 (ip — <f) 
(N sm 2 (ip — <j))) events to the apparatus that detects the 
+1 (—1) event. The maximum number Mjj of angles 
ip — 4> we can distinguish is given by 

M D = a^N, (14) 

where a depends on the number of mistakes in determin- 
ing ip — (p that we find acceptable. The larger a, the larger 
is the probability that the result for ip — cf> is erroneous. 

Obviously, it is easy to simulate this processor on a 
computer. For each of the i = 1, . . . , N input events, we 
generate a uniform random number < r < 1 and send 
out a Xi = +1 (xí = —1) event if cos 2 9 < r (cos 2 9 > r). 
After processing N events, we compute 9 = ip — (p from 

cos 2 0= (l + iV^EÜi^)/ 2 - 

A. Relation to physics 

Up to this point, there is no relation between the math- 
ematical model that we have analyzed and a physical sys- 
tem. However, from the description of the scenario and 
the final result Eq. (|13[) . it is obvious that a processor 
that operates according to Eq. i|13|) is a model for an 
ideal polarizer. We now discuss the relation between the 
optimal probabilistic processor and the measurement of 
the polarization of photons in more detail. 

In classical electrodynamics, it is well known that the 
intensity of light transmitted by a polarizer (such as Nicol 
prisin) is given by Malus' law 

I = Iún 2 {^-4>) , I e =1 cos 2 (ip -(/>), (15) 

where /, I , and I e are the intensities of the incident 
light, the ordinary and extraordinary ray, respectively, ip 
is the polarization of the incident light and <fi specifies 
the orientation of the polarizer From a quantum 
mechanical point of view, the total energy E of a light 
wave of frequency / must be an integer múltiple of h 
(Planck's constant), that is E = nhf, where n is the 
number of photons in the wave. The polarizer splits the 
incoming beam in two beams. Depending on the type of 
polarizer, the light in one of the beams is absorbed [3j but 
this is irrelevant for the discussion that follows. In any 
case, the number of photons in each beam is an integer 
(by definition of the concept of a photon, there is no 



such thing as a half photon). If the number of photons 
in the incident beam is very large, the mean number of 
photons that goes into each beam should correspond to 
the intensity that we find from classical electrodynamics. 
In the regime where the photons are detected one-by-one, 
quantum mechanics postulates that the polarizer sends a 
photon to the (extra)ordinary direction with probability 
(sin 2 (?/> - 0)) cos 2 (V> - 4>) Q. 

The probabilistic processor that we have described 
transforms a beam of photons into yes/no events that 
we can count. If we require the answers of the trans- 
formation process to be probabilistic (Bernoulli trials), 
rotational invariant (a bàsic property of (quantum) elec- 
trodynamics), and to satisfy criteria 1 and 2 of Section[U 
then the device that performs the transformation will 
produce data that agrees with Malus' law. We did not 
invoke any law of physics to obtain this result: Malus' law 
was recovered as the result of efficient data processing. 
This raises the interesting question whether other quan- 
tum phenomena also appear as the result of efficient data 
processing. 

The hypothesis that efficient processing of statistical 
information may be the reason why we observe quan- 
tum mechanical phenomena is very explicit in the work 
of Frieden ^(j> Wootters \ï^, and Summhammer [Ï3| . 
Frieden has shown that one can recover all the funda- 
mental equations of physics by finding the extrema of the 
Fisher information plus the "bound" information [ïoj] . 
According to Frieden, the act of measurement elicits a 
physical law and quantum mechanics appears as the re- 
sult of what Frienden calls "a smart measurement", a 
measurement that tries to make the best estimate |lOj . 
Although this approach is similar to ours, our line of rea- 
soning is diffcrent. We do not invoke concepts from es- 
timation theory, such as the estimators and the Cramér- 
Rao inequality (see Appendix A), nor do we require the 
concept of random noise. Furthermore, in Frieden's for- 
mulation, the parameters to be estimated (such as the 
position) are of the same kind as the measured quan- 
tities. This is not the case for the photon polarization 
that we treat here. In our approach, the measuring ap- 
paratus (such as the calcite crystal acting as a polarizer) 
transforms the input (the photon polarization) into a sig- 
nal (x — ±1) that can be detected by human beings. 
The requirement that the simple probabilistic processor, 
that transforms the data, operates with optimal efficiency 
yields Malus' law. 

The fundamental difference between Frienden's ap- 
proach and ours becomes evident by noting that there 
is no reason why we should limit our search for efficient 
transformation devices to the most simple, Bernoulli- 
type probabilistic machines. As we explain later, these 
machines can simulate the classical and quantum prop- 
erties of a photon polarizer but are incapable of simulat - 
ing interference phenomena. One possible route to solve 
this problem might be to generalize the probabilistic ma- 
chine such that it no longer generates Bernoulli events, 
that is allow for correlations between output events. We 
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don't follow this route. Instead we consider the most 
extreme solution, namely a deterministic processor that 
performs the same task as the probabilistic machine un- 
der the conditions specified in SectionHJ This forces us 
to consider deterministic algorithms with primitive learn- 
ing capabilities (to allow for corrclations bctwccn output 
events). Elsewhere, we have shown that these deter- 
ministic processors (and probabilistic versions thereof) 
can be used to reproduce quantum interference phenom- 
ena [ÏJ, [ï]| 03 ■ We come back to this tòpic in Sec- 
tion lïVl 



III. DETERMINISTIC PROCESSOR 

From an engineering point of view, the probabilistic 
processor of Section^is extremely simple and has a rela- 
tively poor performance. Using N bits, the probabilistic 
processor can encode Mjj oc VA distinguishable mes- 
sages only. For example, as shown in Section[nl if we de- 
mand the level of certainty of 99.7%, then Md » Vn /6. 

It is not unreasonable to expect that a deterministic 
machine can do better in this respect. Therefore, the 
obvious question is to ask if there exists a deterministic 
processor that generates events according to Malus' law. 
Apart from being deterministic, this processor should 
satisfy the two criteria that we specified in Section [ï] 

Adopting the terminology introduced in our earlier 
work Eï El > we refer to the deterministic pro- 
cessor that we describe in this section as a deterministic 
learning machine (DLM). For this machine, Md = N + 1 
with nearly 100% certainty. 

In this paper, we analyze a DLM that has one in- 
put channel, two output channels and one infernal vector 
with two real entries. A DLM responds to an input event 
by choosing from all possible alternatives, the infernal 
state that minimizes a cost function (to be defined later) 
that depends on the input and the infernal state itself. 
Then the DLM sends a message through one of its out- 
put channels. The message contains information about 
the decision the DLM took while it updated its infernal 
state and, depending on the application, also contains 
other data that the DLM may have. By updating its 
infernal state, the DLM "learns" about the input it rc- 
ceives and by sending messages through one of its two 
output channels, it telis its environment about what it 
has learned. A DLM is a machine that performs real- 
time recurrent learning [Ï8| . 

This section consists of three parts. First, we spec- 
ify the algorithm that is used by a DLM and we show 
that in the stationary regime, the number of —1 (+1) 
events in a sequence of N events is given by Malus' law, 
see Eq. (I15|l . Then, we present a detailed mathematical 
analysis of the dynamic properties of a DLM. The reader 
who is not interested in the intricades of this classical 
dynamical system can skip Section IIII Bl We end this 
section by comparing the performance of the probabilis- 
tic and deterministic processor. 



A. Deterministic Learning Machines 

The schematic diagram of the DLM is the same as 
that of the probabilistic processor of Fig. |2 except 
that there is no probabilistic process p(x\ip — 4>). The 
DLM receives as input, a sequence of angles if> n +i f° r 
n = 0, . . . , N and also knows about the orientation of 
the device through the angle <f>. Using rotational invari- 
ance, we represent these input messages by unit vectors 
y n +i = (yi,«+i, 2/2^+1 ) where 

yi, n+ i = cos#„+i V2,n+x = sin0 n+ i, (16) 

and 9 n = ip n — 4>. The fact that Eq. (|16fl depends 
on the relative difference of the angles guarantees that 
the deterministic process is rotational invariant. In- 
stead of the random number generator that is part of 
the probabilistic processor, the DLM has an infernal de- 
gree of freedom that we represent by the unit vector 
x n+ i = (xi iM +i, X2,n+i)- As the DLM receives input 
data, it updates its infernal state. For all n > 0, the 
update rules are defined by 

xi, n +i = axi, n + 13(1 - 6 n +l)j 

X 2 ,n+1 = aX 2 ,n + P@n+1, (17) 

where 0„+i = (1) corresponds to an —1 (+1) output 
event, and < a < 1 is a parameter that controls the 
learning process of the DLM. The requirement that the 
infernal vector x n+ i = (xi <n +x, ï2,n+i) stays on the unit 
circle yields 

P = ± y/ 1 + o?\x\ n (\ - e n+ i) + 4„e„ +1 - 1] 

- a[xi tfl (l - 8 n+ l) + X2,nBn+l}- (18) 

Substitution of Eq. JTHJ) in Eq. gives us four different 
rules: 



X\, n +l = +y 1 + a 2 (2%„ ~ 1), X 2 , n +1 = OtX 2 ,n, 

Xl, n +1 = -Jl + OL 2 (x\ n - 1), X 2 , n +l = aX 2l n, 

Xl, n +1 = UZl,n, X 2 ,n+1 = +Jl + a 2 (x\ n - 1), 

xi, n+ i = ax íin , X 2 , n +1 = -^1 + o?(x\ n - 1), (19) 

where the first (last) two rules correspond to the choicc 
Qn+i = (6„ + i = 1) and the ±-sign takes care of the 
fact that for each choice of O n +i, the DLM has to decide 
between two quadrants. For later, it is important to note 
that |xi, n+ i| > |xi,„| and |a;2,n+i| < \x2,n\ if O n+ i = 0. 
In other words, the angle of the infernal vector relative 
to the a;-axis decreases if we apply the 6„+i = rules. 
The DLM selects one of the four rules in Eq. (|19|l by 
minimizing the cost function defined by 

C = -x n+1 • y n+ i = (x n+ i - y n+ i) 2 /2 - 1 

= -(íCl,n+iyi,n+l +X2, n +iy2,n+l)- (20) 



65 

60 

55 

ï 50 
tu 

I 45 
of 40 

35 

30 

25 

10 20 30 40 50 60 70 80 90 100 
n 

FIG. 3: (color online) Time evolution of the angle 9 n — 
arctan(a;2,n/a;i,n) representing the internal vector x„ of the 
DLM defined by Eqs. and Bullets (red): Input 

events carry vectors y n +i = (cos 60°, sin60°). The initial 
value #o ~ 81°. For n > 20 the ratio of the number of 
increments (O n +i = 1) to decrements (<d„+i — 0) is ex- 
actly 3/1, which is (sin 60°/ cos 60° ) 2 . Squares (blue): In- 
put events carry vectors y n +i = (cos 30°, sin 30°). The initial 
value 9o ~ 327° . For n > 60 the ratio of the number of in- 
crements (On+i = 1) to decrements (O n +i = 0) is exactly 
1/3, which is (sin 30° / cos 30° ) 2 . The direction of the initial 
vectors xo is chosen at random. In this simulation a = 0.99. 
Data for n < 10 has been omitted to show the oscillating 
behavior more clearly. Lines are guides to the eyes. 



Obviously, the cost C is small if the vectors x„ + i and 
y n +i are close to each other. Summarizing: a DLM min- 
imizes the distance between the input vector and its in- 
ternal vector by means of a simple, deterministic decision 
process. 

In general, the behavior of the DLM defined by rules 
Eqs. H19|) and (|20[1 is difncult to analyze without the 
use of a computer. However, for a fixed input vector 
y n+ i = y, it is clear what the DLM will try to do: It will 
minimize the cost Eq. (I2U[) by rotating its internal vector 
x n+ i to bring it as close as possible to y. However, x Ii+ i 
will not converge to a limiting value but instead it will 
keep oscillating about the input value y. An example of 
a simulation is given in Fig. [3] In general, for a fixed 
input vector y„+i = y the DLM will reach a state in 
which its internal vector oscillates about y. This is the 
stationary state of the machinc. Obviously, the whole 
process is deterministic. The details of the approach to 
the stationary state depend on the initial value of the 
internal vector xo, but the properties of the stationary 
state do not. 



1. Stationary state 

The stationary-state analysis is a very useful tool to 
understand the behavior of the DLMs. Let us assume 
that <C a < 1 and that we have reached the stationary 
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350 



FIG. 4: The number of (O n +i = 1) events divided by the 
total number of events as a function of the value of the input 
variable 9. Bullets: Each data point is obtained from a DLM 
simulation of 1000 events with a fixed, randomly chosen value 
°f < <f> < 360° , using the last 500 events to count the number 
of (6 n +i = 1) events. Sòlid line: cos 2 9. 



regime in which the internal vector performs small oscil- 
lations about (cos 8, sin 8) (as in Fig. For simplicity, 
but without loss of generality, we limit the discussion 
that follows to < 8 < ir/2. For Q n +i = we substitute 
%2,n — smifin and 8 n+ \ = tp n + Sq in Eq. (|19|l and obtain 

sin 2 ip n + 2Sq sin ip n cos tp n — a 2 sin 2 ip n . (21) 

Similarly, for 6„+i = 1 we substitute X2, n = sin ip n and 
9 n+ i = ip n + S\ in Eq. H19J) and obtain 



sin <f n + 2Si sin ip n cos (f n — 



-a 2 cos 2 ip n + l. (22) 



In deriving Eqs. I|21|l and l|22() . we neglected terms of 
order 5 2 and Sf, respectively. Rearranging Eqs. fÏÏT|l and 
(|22J) . and using ip n ~ 8 gives 



ío = 
Sí = 



1 — cr sin 8 
2 cos 8 



sm t 



if e n+1 = o, 
if e„+i = i. 



(23) 



In the stationary regime, the sum of all increments of <p n 
should be compensated by the sum of all decrements of 
(p n . Therefore, we must have A^o^o + NiSi w where 
No (Ai) is the number of n +i = (8„+i = 1) events. 
From Eq. 12: ili it follows immediately that 



tan 2 fl 



and hence 
Ni 



No + Ai 



sin 2 8 



^1 

N () 



No + Ai 



(24) 



cos 2 8. (25) 



Fig. 0] shows that the simulation results generated by 
the DLM are in excellent agreement with the expressions 
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obtained from this simple analysis. In fact, we will see 
later that in the stationary state, a DLM can encode 
exactly all angles for which sin 2 8 = n/N where n = 
0, . . . , N. From the definition of the DLM algorithm and 
Eq. (|25|l . it is clear that the rcquircmcnts of rotational 
invariance and insensitivity with respect to small changes 
in 6 = </) — ip (criterion 1 of Section [ÏJ are automatically 
satisfied. We emphasize that Eq. i|25[l is not put into 
the DLM algorithm but results from the learning process 
itself. 

Comparing Eq. (|15fl and Eq. (|25[) . we conclude that 
once the DLM has reached a stationary state, the numbcr 
of +1 and —1 output events in a sequence of N = Nq+Ni 
events agrees with Malus' law. Of course, the order in 
which the DLM generates the +1 and —1 is strictly de- 
terministic. Anticipating that we will show that a DLM 
is a very efficient machine, what is most striking is that 
the number of — 1 and +1 events it generates is propor- 
tional to sin 2 (-0 — <fi) and cos 2 {4> — </>), respectively, just 
as in the case of the simple probabilistic processor and in 
the classical electrodynamical and quantum mechanical 
description of the polarizer. 



B. Analysis of the dynamic properties 

For a more detailed mathematical analysis of the dy- 
namics of a DLM, it is convenient to write the update 
rules Eq. 1)19(1 as linear difference equations. Actually, 
we need only 



c l,n + í 1 - a 2 )Q n+1 



(26) 



J '2,n+l — 

For simplicity, we restrict the discussion that follows to 
the case < 8 < 7r/4. Other cases can be treated in the 
same manner. 

Substituting X2, n = sin(^ n in Eq. (|26[) . we obtain 

sin 2 tp n+í = a 2 sin 2 ip n + (1 - a 2 )6, i+ i, (27) 

showing that Eq. I126II has the structure of a so-called 
circle map [ïíïj . Thus, the study of the behavior of the 
circle map Eq. I(27|l will give us insight into the dynamic 
properties of the DLM. Fig.[S]shows an example of circle- 
map analysis for the case of a fixed input angle of 30°. 



C\l 

M 
X 



0.28 



0.27 



0.26 



0.25 



0.24 




0.23 



0.22 



0.21 



0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 

x 2,n 

FIG. 5: (color online) Circle map of the time evolution of x\ n 
for the case of a fixed input angle of 30° . The dashed (green) 
line shows the evolution of the mapping icl.n+l 

= F(x\ n ) for 

n < 100. For clarity, we omitted the first 12 iterations because 
this allows us to show in detail how the mapping converges 
to a unique polygon. The function F(x 2 ) is defined by the 
rules and cost function Eq. 1191 and Eq. 12011 . respectively. 
The dotted (blue) line separates the case O n +i = from 
the case O n +i = 1 and is given by y = o?x + 1 — o? for 
9„+i = 1 (x < 1/4) and y = a 2 x for 6„+i = (x > 1/4). 
The straight sòlid (red) line is given by y = x. The sòlid (red) 
line forming the polygon with eight vertices shows the results 
for 9900 < n < 10000: In this case the system has reached 
the stationary state with a period of four. In this simulation, 
a = 0.99. 



As the DLM repeats the same sequence over and over 

2 . In other words, if we observe 



again, we have x\ K+l = .r 
the repeated sequence {00 . 
have 



. 01} of length K+l, we must 



x 2 



1-a 2 



v 2K+2 ■ 



(29) 



a 2 ^x 2 for j 



0, . . . , K, the mean 



Furthermore, as x\ . 
value of the x\ 's during the sequence is given by 



1. Illustrative example 

Let us assume that we have reached a station- 
ary state and that the DLM repeats a sequence 
{00 . . . 00100 . . . 00100 . . .} in which there are K succes- 
sive events of the type & n +i = (decreasing X2.j) and 
one 6„+i = 1 event (increasing X2j)- Let us denote by 
x, the value of X2, n +i before the first of the K events of 
type 0. From Eq. ^ we obtain 



b 2,K 



a 2K x 2 , 



a 2 x 2 2 ^ K 4 
a 2K+2 x 2 



1-a 2 
+ l-c 



(28) 



K 



1 K 

3=0 



1 



2,j 



K+l 



sin 2 8, (30) 



in agreement with Eq. (|25|l . From Eq. I|3U|) . we conclude 
that the DLM can encode the vàlues 8 = arctan(l/VÏ?) 
with periòdic sequences of the form {00 . . . 01}. 

From this analysis we conclude that if we would limit 
the design of the device such that it can only generate se- 
quences of the form {00 . . . 01}, then, after observing two 
one's and counting the zeros between these two one's, 
we can determine the angle with an error of less than 5 
degrees. This is the worst case and occurs when the se- 
quence is {010101010 . . .} (45deg) and {0010010010 . . .} 
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(35deg). Clearly, even with this limitation (K zero's fol- 
lowed by one 1) on the design, this is already a very 
efncient method to encode the angle. 

We now extend this analysis to a general periòdic se- 
quence. 



2. Minimum angle 

First we show how the control parameter a limits the 
accuracy with which we can represent the stationary 
state. Let us assume that the fixed input vector is given 
by y = (j/i> Z/2) and that for some index n, the machine 
is in the state x n = (1,0), as illustrated in Fig. (the 
cases (0,1), (—1,0), and (0,-1) can be treated in the 
same manner and lead to the same conclusion). 

If the machine applies the update rule S n +i = 0, the 
new state and the cost are given by 



xi,n+i = y 1 + o?(x\ n - 1) = 1, 

X2,n+l = ax 2 ,n = 0, 

C = -yi. 




{a,~J\-a 2 j 



(31) 



(1,0) 



FIG. 6: (color online) Illustration of a situation in which the 
machine remains in the state x = (1,0). The input vector is 
y — (2/1,3/2). The internal state is x n = (1,0), and the new 
internal state is either x n +i = (a, \/l — a 2 ) or x n +i = (1, 0). 
In general, the smallest angle 9 m í n for which the machine 
remains in the state x = (1,0) depends on the value of the 
parameter a, see Eq. I|85|l . 



The cost C has to be compared to the cost of applying 
the update rule 0„+i = 1, in which case we have 



%i, n+i 



axi 



X 2 ,n+1 = \J^+U 2 {A,n - 1) = VT 

C = -(ayi + Vi\J\ - ol 2 ). 



(32) 



Note that the point (1,0) is somewhat special in the 
sense that the machine remains at (1, 0) if it applies the 
update rule Q n +i = 0. The machine stays at (1,0) 
(forever) unless the cost of applying the update rule 
0„ + i = 1, is less than the cost of applying the update 
rule 6„ + i = 0. From Eqs. l|ÏÏT|) and the necessary 

condition for the machine not to get stuck at (1, 0) is 



ayi + j/2 \/l - a 2 > y\. 
Rearranging Eq. yields 

1 - ot 



tan 



4> . 

y\ 1 + q 



(33) 



(34) 



Thus, Eq. H32|l shows that we cannot represent angles 8 
that are smaller than 



arctan — a)/(l + a). 



(35) 



For a = 0.99 (0.999), typical vàlues used in simulations, 
Omin = 4.05° (1.28°). Note that 9 m i n does not determine 
the accuracy in the interval [8 m in, t/4]. 



3. Periòdic sequences: General case 

We now consider situations in which the sequence 
of events consists of a repetition of the sequence 
{0„ + i,6 n+ 2,...,6 n+ jv;6„ = 6„+jv} of lcngth N. 
First, we determine the solution i\ n of x\ n = x\ n , N 



(implying x\ n = . 
Eq. H2fí(l is given by 



' l.n+N 



). The formal solution of 



= a 2k x 2 



3=1 



and the requirement x\ n = x\ n+N yields 



T 2 

X 2,n+N 



1-a 2 

1 _ ™2N 



N 



(37) 



We conclude that if the machine starts from x 2 n and 
generates the events {O n +i, ©n+2, • ■ ■ , B„+tv}, it re- 
turns to the starting point x 2 n . For each pattern 
{O n +i, 8 n +2, • ■ ■ , 6n+7v}, there exists such a point x\ n . 



In other words, if the machine is in the state 



'2,ni 



rc- 



peating the sequence {0„+i, 8„+2, ■ ■ ■ , Qu+n} generates 
a periòdic motion of x 2 n+k for k > with period N. 

Second, we consider the situation in which the machine 
starts from x\ n + e and we keep feeding the machine with 
the periòdic sequence {O n +i, 0„+2, ■ • ■ , Qu+n}- Using 
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the general expression Eq. (|36(l . we find 



x 2,n+pN 



= Oi c x 2 n + Oi r e 



2pN ( 



pN 



+(i-a 2 )$> 2 ^·>© n+J ·, 

— a X 2,n+(p-l)N +a 6 
N 

+ (l-a 2 )^a 2 ^e„ +J , 
= a 2N xl n + a 2 ? N e 

N 

+ (l-a 2 )]Ta 2 ^)e„ + „ (38) 

where p denotes the number of times the machine pro- 
cesses the periòdic sequence {©„+!, 0„+2, • • ■ , 0„+jv}- 
As a < 1, linip^oo x 2 . n+pA r = ^l.nj independent of 
the choice of e. Therefore, for any periòdic sequence 
{0„ + i,0„ +2 ,...,6„+Ar;6 n = 6„+jv} of lcngth N, 
the corresponding sequence {x\ x\ J+2 , . . . , x\ 
converges exponcntially fast to the periòdic sequence 



{ X 2,j+l' X 2,j+2 



lows that 



j+2) ■ • ■ ' X 2,j+N 



r }. From Eq. it then fol- 



N-l 



N-l 



i=0 i=0 
1 - Of 



i=0 



and using X2, n +N = X2,n we find 



1 N A 1 W 



e. 



(39) 



(40) 



Note that is a rational number and that according to 
Eq. we have w sin 2 6*. 



^. Lowerbound on the control parameter a 

Previously, we have tactically assumed that we can 
always find the periòdic sequence of n 's that represents 
the input angle 9. We now show that for a fixed input 
angle 9, the control parameter a has to be large enough 
(but smaller than one) in order that the DLM repeats 
the same sequence {0„+i, 0«+2, ■ ■ ■ , ©n+iv}. As before, 
we confine the discussion to input angles that satisfy < 
tan (9 = í/2/2/1 < 1- Then, the number of events is larger 
than the number of 1 events. Without loss of generality, 
we may put 0„+i = 0. This means that the infernal 
state (xi,n, £2,71) of the DLM satisfies X2, n > J/2- If the 
sequence is to be periòdic with period N, we must have 

%2,n+N — X2,n- 

So far, we did not consider the cost of going from 
%2.n+N-i to X2 t n- Denoting z = X2 t n+N-l to simplify the 
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FIG. 7: Plot of the function f(a, K) (see Eq. as a func- 

tion of a for K — 57 (sòlid line) and K — 80 (dashed line). In 
the stationary regime and for f(a, K) > 0, the DLM repeats 
the sequence {00 . . . 001} with K zero's, that is, it generates 
and exact representation of K. 



expressions, the new infernal states after a &„+n = or 
0n+iv = 1 event are 



Xq = (v 1 — a 2 z 2 , az), 



Xi = (a\/l-z 2 ,^l-a 2 + a 2 z 2 ), (41) 

respectively. According to the general rules, the DLM 
determines 0„+jv by comparing the costs 



C = -vT-T 



a 2 z 2 cos 9 — az sin 9, 



Ci = -ay/l - z 2 cos 6 - y/l - a 2 + a 2 z 2 sin 9,(42) 

for the two alternative infernal states of Eq. I|41|) . The 
DLM generates a & n +N = 1 event if C\ < Cq. After 
some rearrangements we obtain 



tan# > 



vï — ce 2 + a 2 z 2 



yï — oP-z 2 + ayï — z 2 



(43) 



In general, z is a function of 01. Therefore, for a fixed a, 
Eq. (|4*3")l sets an upperbound to the input angle for which 
the DLM can generate a periòdic sequence. 

As an illustration, we consider the sequence 
{00... 001} in which there are K events and one 
1 event. The initial point for the periòdic continua- 
tion {00 . . . 00100 . . . 001, . . .} of this sequence is given by 
Eq. (|29|l . Let us assume that the DLM starts from this 
initial point and generates K zero's, changing its infer- 
nal state from x 2 to z 2 = a 2K {1 - a 2 )/(l - a 2K+2 ). The 
DLM will generate a 0a'+i = 1 event if 



a z 



+ v'l - a 2 + a 2 z 2 



K yT - a 2 z 2 + avi 



> 0. (44) 



In Fig.[7|we plot f(a,K) as a function of a for K = 57 
and K — 80 (plots for other vàlues of K show the 
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TABLE I: Sequences {0i, . . . , 9 } marked with a * yield the smallest variance A 2 



&=p/q 



{Oi 



,e g } 



*2,0 



1/2 



10* 



1 + a 2 



1/3 



100* 



l + a 2 + a 4 



9 (l + a 2 +a 4 ) 



1/4 



1000* 



(l + a 2 )(l + Q 4) 



(l-a 2 ) 2 (3+4a 2 +3c 4 ) 
16(l + a 2 ) 2 (l + a 4 ) 



2/5 
2/5 



11000 
10100* 



8 (l + a 3 )(l-a 2 ) 

« 



2(l-c 2 ) 2 (3+4c 2 +3a 4 ) 
25 (l + a 2 +a 4 +a 6 +a 8 ) 
2 (l-a 2 ) 2 (3-a 2 +3a 4 ) 
25 (l + a 2 +a 4 +a 6 +a 8 ) 



2/8 
2/8 
2/8 

2/8 



11000000 
10100000 
10010000 
10001000* 



1-ct 1 

1 



l + a 2 +a 8 +a lu 

1 -a 2 +g 4 
l + a 4 +a 8 +a 12 

„6 l-a 2 



l-a 8 



(l-a 2 ) 2 (3+2a 2 +4 a 4 +2a 6 +3a 8 

16 (l + a 4 +a 8 + a 12 ) 
(l-a 2 ) 2 (3+4o ! +4 a 4 +4o 6 +3a 8 

16 (l + a 2 ) 2 (l + a 8 ) 
(l-a 2 ) 2 (3-2a 2 +4a 4 -2a 6 +3a 8 

16 (l + a 4 +a 8 + a 12 ) 
(l-a 2 ) 2 ( 3 +4a 2 +3 a 4 ) 

16 (l + a 2 ) 2 (l + a 4 ) 



3/8 

3/8 
3/8 
3/8 
3/8 
3/8 
3/8 



11100000 

10110000 

10011000 

11010000 

10101000 

10010100* 

11001000 



1-aie 



8 1-q"+q°-q 

a — — 



4 +a 6 - 

1-Q 1 

4,8 10 
1-Q 1 



6 l-a*+a s -o 

a i rm 



l-a 2 +a 4 -a 8 



6 (l-a 2 )(l+a 4 +a a ) 

a nrTTre 



4 (l-c 2 )(l+^+ a "') 
a 



„6 l-a 2 +a 6 -a 10 



l-a 2 


2 (15+44 a 


+ 71 a 4 +80a 6 +71 a 8 +44 a 10 + 15 a 12 ) 




64 (l 


+ a 2 ) 2 (l + a 4 +a 8 +a 12 ) 
+ 39 a 4 +48 a 6 +39 a 8 +28 a 10 + 15 a 12 ) 


l-a 2 


2 (l5+28a 




64 (l 


+ a 2 ) 2 (l + a 4 +a 8 +a 12 ) 
+ 23 a 4 + 16 a 6 +23 a 8 +28 a 10 + 15 a 12 ) 


1-Q 2 


2 (15+28 a 




64 (l 


+ a 2 ) 2 (l+ a 4+a 8 +c 12 ) 
+39 a 4 +48 a 6 +39 a 8 +28 a 10 + 15 a 12 ) 


1-Q 2 


2 (15+28 a 




64 (l 


+ a 2 ) 2 (l + a 4 +a 8 +a 12 ) 
+ 23 a 4 + 16 a 6 +23 a 8 + 12a 10 +15o 12 ) 


1-Q 2 


) 2 (15+12 a 



64 (l + a 2 ) 2 (l+ a 4+a 8 +a 12 ) 
(l-a 2 ) 2 (15+12 a 2 +7a 4 + 16 a 6 +7a 8 + 12 a 10 + 15 a 12 ) 
64 (l + a 2 ) 2 (l + a 4 +a 8 +al 2 ) 
(l-a 2 ) 2 (15+28 a 2 +23 a 4 + 16 a 6 +23 a 8 +28 a 10 + 15 a 12 ) 
64 (l + a 2 ) 2 (l+ a 4+a 8 + a l 2 ) 



2/9 
2/9 
2/9 
2/9 



110000000 
101000000 
100100000 
100010000* 



_14 l-a* 



ni 2,4 6 
12 1 — a 4-g — a u 



10 l-a'+a D -a° 
* I^T* 

fi , 2,8 10 
S 1— a + a — a 



2 (l-a 2 ) (7+12a 2 +15a 4 + 16a 6 + 15a 8 + 12a 10 +7a 12 ) 
81 (l + a 2 +a 4 ) (l + a 6 + a 12 ) 
2 (l-a 2 ) 2 (7+3 a 2 + 15 a 4 +7 a 6 +15 a 8 +3 a 10 +7a 12 ) 
81 (l + Q 2 +Q 4 + a 6 +a 8 +a 10 + a 12 +a 14 + a 16) 

2 (l-a 2 ) 2 (7+3 a 2 +6 a 4 + 7 a 6 +6 a 8 +3 a 10 + 7 a 12 ) 
81 (l + a 2 +a 4 + a<3+a 8 +a 1 + a 12 +a 14 + a 1| 3) 

2 (l-a 2 ) 2 (7+3a 2 +6a 4 -2a 6 +6a 8 +3a 10 +7a 12 ) 
81 (l + a 2 +a 4 + a 6 +a 8 +a 10 + a 12 +a 14 + a 16 ) 



same behavior). From Fig. and Eq. Q44JI. we con- 
clude that the DLM will indeed repeat the sequence 
{00 . . .001} with K = 57 (K = 80) if 0.9967 < a < 1 
(0.9983 < a < 1). Otherwise, if a is not within this 
range, the DLM generates at least one extra event and 



the DLM does not return to the initial state i 2 . Thus, 
if a is such that f(a, K) < 0, the DLM cannot generate 
the sequence that gives an exact representation of 1/K 
(although it still gives an accurate approximation) . 



J 



5. Variance of the periòdic sequences 



Next, we compute the variance of the periòdic, stationary state {x\ x\ J+2 , 
variance reads 



x\ j +N }. The expression of the 
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Using 



we obtain 



1 JV " 1 
3=0 



JV-l 



JV-l 



J=0 



J=0 



« X 2,n+j + (1 - « ) ©n+3+1 + a (1 - a' ) Sn+i+l^^+j, 



(45) 



(46) 



2\2 



1 -a 4 

1-a 2 
1 + a 2 

1-a 2 
1 + a 2 



e 



2a 2 1 



JV-l JV-l 







1 — a 2N N 
2a 2 1 



n+j— i 



j=0 i=0 
JV-l JV-l 



1 — a 2Ar iV 



^ a 2í 0n+j+i+10n+j 



j=Q i=0 



1 + a 
1 — a 



2/V 



2JV 



e 



2a 2 1 

1 - a 2N N 



JV-l JV-2 



J2 E « 2,: 0™+ J+í +i0 



j=0 i=0 



(47) 



where in the last step, we have taken out from the double sum, all terms of the form 8 n +j8 n +j. 

I 



In Table [I] we present analytical results for the ini- 
tial points and variances of some simple sequences 
{0i, . . . , q } with periods q and = p/q where both 
p and q are integers. If p and q have a common factor c, 
as is the case for p = 2 and ç = 8, the problem simpli- 
fies to the case (p/c)/(q/c). The sequences {©i, . . . , Q q } 
marked with a * yield the smallest variance A 2 . Thcsc 
are exactly the sequences that the DLM generates in the 
stationary regime, provided 1 — a is sufhciently small 
(a = 0.99 is sufficient for the (p, g)-cases presented in 
Table D). 

In general, any sequence of O's and l's that begins 
with a 1, can be viewed as a concatenation of subse- 
quences that start with a 1 followed by one or more 
O's. The examples in Table [I] suggest that the sequences 
{0i, . . . , Q q } with the smallest variance consist of n\ sub- 
sequences of length L\ = [q/p\ < q/p and ri2 subse- 
quences of length L 2 = \q/p] = L\ + 1 > q/p. We have 
not been able to prové that in general this is the structure 
of the minimum variance solution. However, the relation 
between the minimum variance solution and the ground 
state configurations of a one-dimensional lattice model 
to be discussed next, suggests that this may well be the 



6. Generalized one-dimensional Wigner lattice 



From Eqs. (|45|l and l|47|l. it follows that minimizing 
the variance A 2 is tantamount to finding the periòdic se- 
quence {0„+i, 0„+2, ■ • • , ©n+jv; ©n = ©n+jv} that min- 
imizes the last term in Eq. 147|) . subject to the constraint 



© = TV -1 J2f=i We now show that solving for the 

sequence that yields the lowest variance amounts to find- 
ing the ground state configuration of a classical many- 
body system. 

If we intèrpret Qj = (1) as the absence (presence) of 
a particle at the site j of a one-dimensional lattice, then 
the last term in Eq. I)47|) can be written as 



JV-l JV-2 
1=0 fc=0 



(48) 



where Vk = a 2k and n, takes the value or 1 if the site i 
is empty or occupied, respectively. Clearly, if a < 1, the 
potential Vk satisfies the two conditions 



lim V k = 



+ V k+1 > 2V k 



k > 1. (49) 



The density of partides p = N 1 Ylj=x n j ^ s given by 

P = e. 

In the limit N oo, the problem of finding the ground 
state configuration of partides for a system defined by the 
Hamiltonian 



H 



(50) 



and satis fying the two conditions Eq. I|49ll was solved by 
Hubbard |2(|. For any density p = p/q where p and q 
are integers with no common factor, the ground state of 
Eq. I|50l) is periòdic with period q and p partides in each 
period |20| . 

Hubbard gives an algorithm to generate the ground 
state configuration for a pair {p,q) and calls these ground 
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FIG. 8: (color online) The error e(N) defined by Eq. 1521 as a 
function of the number of events N for M = 100 (correspond- 
ing to 101 different input angles). For each m = 0, . . . , M, we 
generate TV input events, each input event carrying the mes- 
sage Vn+i = (cos(arcsin yj m/M), sin(arcsin yj m/M)). Note 
that y n +i is a vector of rational numbers. Sòlid (red) 
line: Probabilistic Bernoulli-type processor (see Section UB : 
Dashed (green) line: Deterministic learning machine (see Sec- 
tion ^QJ. Dotted (blue) line: Modified deterministic learning 
machine, see Eq. (I51H . In all the DLM simulations, a = 0.9995 
and the first 10000 event were discarded to allow the DLM to 
approach the stationary state. 



state configurations gcncralizcd onc-dimcnsional Wigncr 
lattices [2(|. His algorithm also generates the sequences 
{9i,...,ej in TableUlthat are marked with a *. This 
is not a surprise: The periòdic sequences with the small- 
est variance A 2 are also the ground state configura- 
tions of model Eq. (|5Ü|) . Extensive numerical tests for 
q = 2, . . . , 10000 and 1 < p < q (results not shown) 
confirm that the ground state configurations generated 
by Hubbard's algorithm are the same as the periòdic se- 
quences generated by the DLM in the stationary regime, 
for a fixed input y = (yi,y 2 ) 1 y\ = q, v\ = p, and suffi- 
ciently small a. 



C. Performance analysis 

The non-analytic character of the DLM algorithm and 
the complicated dependence on the parameter a make 
it difhcult for us to proof more rigorous results about 
the DLM dynamics than those presented earlier. On the 
other hand, it is very easy to study the dynamics nu- 
merically. Extensive simulation work (results not shown) 
demonstrate that, with a proper choice of a (see Sec- 
tions IIII B 21 and IIII B 41) , a DLM can encode all rational 
numbers n/N for n = 0,...,N. Thus, for each input 
angle tp for which sin 2 (-0 — 0) is a rational number, there 
is the stationary state in which the DLM generates to 
a unique, periòdic sequence (with minimum variance) of 
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FIG. 9: (color online) Same as in Fig. |H| except 

that the input events carry the message y n +i = 
(cos(m7r/2M), sin(m7r/2M)) for m = 0, . . . , M. 



l's and 0's from which the value of sin 2 (-0 — <fi) can be 
determined with very high precision. 

The update rule that the DLM uses is quite subtlc, 
and we demonstrate this by changing the rules Eqs. i|19|) 
and (H to 

2 2 2 , /i 2\r\ 

%2,n+l = a x 2,n + ( l - a )©n+l, 

§ - = ^-ra)' (51) 

For < 9 < 7r/2 (the case of interest for the present 
analysis) , this rule telis the machine to rotate its infernal 
vector towards the input vector y n +i = (y\, n +i, 2/2, n+i)- 
In contrast, the DLM that operates according to the rules 
of Eqs. (|19fl and (|20fl may decide to rotate its infernal 
vector away from the input vector. 

For a quantitative comparison of the performance of 
the probabilistic processor, the DLM defined by the rules 
of Eqs. ||TÜJ and (gU, and the DLM defined by the rules 
of Eq. H51f) . we carry out the procedure that follows: 

1. set M = 100 and choose <j) & [0, 360[ randomly 

2. for each m = 0, . . . , M 

3. set e m (N) = 

4. compute tp m — <j> = arcsin y 7 m/M 

5. for n = 1, . . . , JV 

6. generate an input event carrying the message y„ = 
(cos(arcsin yj m/M), sin(arcsin y/m/M)) 

7. count the number K of 1 events generated by the 
processor 

8. end loop over n 
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9. compute ip' m — <f> = arcsin \J~Kj~N 

10. set e m (N) = e m (N) + ty m - <) 2 

11. end loop over m 

The number accumulated in e m (N) yields 



e(JV) 



\ 



1 



M - 



M 
m=0 



\ 



Af- 



ilí 

m=0 



(V'm - V'm) 2 



(52) 



which is the error averaged over Af + 1 different pairs 
of (input, output) angles for a fixed number N of input 
messages. Fig. [S] shows the error e(N) as a function of 
the number of events N. In this case, cos 2 (V'm — <f) an d 
sva^^m—fj)) are rational numbers and the results of Fig-IEI 
confirm that the DLM performs very well, much better 
than the probabilistic processor. Fig. also shows that 
replacing the update rule Eq. l(2*Ü|) by Eq. ijSTJl has a large 
impact on the performance of a deterministic learning 
machine. 

If we replace ip m — <j> — arcsin \J mjM by %jj m — <$> = 
nnr/2M in step 4 of the procedure described earlier, then 
sin 2 (tp m — <j>) is not necessarily a rational number and this 
affects the performance of the DLM, as shown in Fig. 
A closer look at the DLM data for different m (results 
not shown) reveals that the large increase of the error 
is due to the relatively poor accuracy for m ~ and 
rn w Af. This is hardly a surprise: From Sections lIII B 21 
and IIII B 41 we know that the choice of a is more im- 
portant for 9 « mi/ 2 than it is for 9 w nir/2 + 7r/4. 
Therefore, if optimal performance for all 9 is crucial, it 
is necessary to adjust a dynamically by another learning 
process. We leave this tòpic for future research. 



IV. RELEVANCE OF THE LEARNING 
PROCESS 

The fundamental difference between the simple proba- 
bilistic processor of Sectionlïïland the DLM of Section lïTïl 
is that the latter has a learning capability. Elsewhere, 
we have shown that learning is an cssential ingredient 
of networks of probabilistic or deterministic processors 
that are able to simulate, event-by-event, quantum in- 
terference phenomena and universal quantum computa- 
tion [ïil Il5l lïfí IÏ7j . The fundamental rcason for this 
is that some form of communication between individual 
events is required in order to simulate (classical or quan- 
tum) interference phenomena. Although the Bernoulli- 
type probabilistic processor of Section[n]satisfies our cri- 
tcria 1 and 2 of Section ^ for an efhcient processor, it 
generates uncorrelated events and any form of commu- 
nication between events is absent. Therefore the proba- 
bilistic processor of Section [H] cannot simulate interfer- 
ence phenomena but the DLM of Section lTnl can because 



it has the additional feature of being able to learn from 
previous events. 

As a non-trivial illustration of the importance of the 
learning process, we consider the interferometer depicted 
in Fig. ^] 21] . This interferometer consists of two 
chained Mach-Zehnder interferometers 3]. Photons leave 
the source (not shown) located at the bottom of the left- 
most vertical line. The beam splitters, represented by the 
large rectangles, transmit or reflect photons with prob- 
ability 1/2. After leaving the first beam splitter in the 
vertical or horizontal direction, the photons experience a 
time delay that is determined by the length of the opti- 
cal path from one beam splitter to the next. The length 
of each path is variable, as indicated schematically by 
the controls on the horizontal lines. In a wave mechani- 
cal description, the time delays correspond to changes in 
the phase of the wave. The thin, 45°-tilted lines act as 
pcrfcct mirrors. 

In quantum theory, the presence of photons in the in- 
put modes or 1 of the interferometer is represented 
by the probability amplitudes (ao,ai) 4,j. According to 
quantum theory, the amplitudes (60,61) of the photons 
in the output modes and 1 of a beam splitter are given 

by Q 



J_ / 1 i 

V2\i 1 



«o 
«1 



(53) 



The amplitudes to observe a photon in the output modes 
and 1 of one Mach-Zehnder interferometer of Fig. ITÜI 
are given by 



1 i 
i 1 











(54) 



The amplitudes to observe a photon in the output modes 
and 1 of two chained Mach-Zehnder interferometers are 
given by 



1 i 
i 1 











(55) 



In Eqs. and (JïïSJ, the entries for j — 0, 1, 2, 3 im- 
plement the phase shifts that result from the time delays 
on the corresponding path (including the phase shifts due 
to the presence of the perfect mirrors). The probability 
to detect a photon in cithcr output mode or 1 of the 
two chained Mach-Zehnder interferometers are given by 
I64I 2 or I65I 2 , respectively. 

Using DLM networks, it is possible to reproduce the 
wave-like behavior by an event-by-event, particle-like, 
simulation without using wave mechanics |l4l Il5l lla. Il7| . 
Elsewhere 0HHE3 we have shown that DLM net- 
works can simulate, event by event, singlc-photon beam 
splitter and (modified) Mach-Zehnder interferometer ex- 
periments pa. I23I] . 

Fig. II 01 shows the schematic diagram of the DLM net- 
work that performs the event-by-event simulation of the 
two chained Mach-Zehnder interferometers [2Ï]]. Parti- 
des emerge one-by-one from a source (not shown) located 
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Single-photon Mach-Zehnder interferometers 



# EVENTS 




FIG. 10: (color online) Snapshot of the event-by-event simulator of two concatenated Mach-Zehnder interferometers |2lT |. The 
main panel shows the layout of the interferometer. Partides emerge from a source (not shown) located at the bottom of the 
left-most vertical line. After leaving the first beam splitter in either the vertical or horizontal direction, the partides experience 
time delays that are specified by the controls on the lines. In this example, the time delays correspond to the phase shifts 
4>o — 152°, <j>i = 302°, <j>2 = 0°, and 03 = 342° in the quantum mechanical description. The thin, 45°-tilted lines act as perfect 
mirrors. When a particle leaves the system at the top right, it adds to the count of either detector N4 or N$. Additional 
detectors (iVo, Ni, N2, and N3) count the number of partides on the corresponding lines. The other celis give the ratio of the 
detector counts to the total number of partides (messages) processed and also the corresponding probability of the quantum 
mechanical description. At any time, the user can choose between a strictly deterministic and a probabilistic event-by-event 
simulation by pressing the buttons at the top of the control panel. 



at the bottom of the left-most vertical line. At any time, 
there is at most one particle (represented by the small 
sphere) in the system. The number of partides that have 
left the source is given by N. 

Each particle carries its own clock. There is a one- 
to-one correspondence between the direction of the hand 
of the clock and the message y„+i = (yi,„+i, 2/2,n+i)- 
The clock is read and manipulated by the beam splitters, 
represented by the large rectangles. Each beam splitter 
contains two DLMs [Ï4[- The internal structure of the 
DLM network that performs the task of a beam splitter 
is described in detail elsewhere 0, 0, 0, 0] , so there 
is no need to repeat it here. Of course, these networks 
are the same for the three beam splitters. 

After leaving the first beam splitter in either the ver- 
tical or horizontal direction (but never in both), the par- 
ticle experiences a time delay that is determined by the 
controls on the lines. This time delay is implemented as 
a rotation of the hand of the particle's clock. When a 



particle leaves the system at the top right, it adds to the 
count of either detector N4 or N$ . Additional detectors 
(No, Ni, N2, and N3) count the number of partides on 
the corresponding lines. The label of <f>j in the quan- 
tum mechanical description is the same as the label of 
the corresponding counter Nj. The other celis give the 
ratio of the detector counts to the total number of par- 
tides (messages) processed and also the corresponding 
probability of the quantum mechanical description. At 
any time, the user can choose between a strictly deter- 
ministic and a probabilistic event-by-event simulation by 
pressing the buttons at the top of the control panel. We 
emphasize that this DLM-based simulation is dynamic 
and adaptive in all respeets: During the simulation, the 
user can change any of the controls and after a short tran- 
sient period, the DLM-network generates output events 
according to the quantum mechanical probabilitics. 

The snapshot in Fig.^|is taken after N — 236 parti- 
des have been generated by the source (with one particle 
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still under way). The numbers in the various correspond- 
ing fields clearly show that even after a modest numbcr 
of events, this event-by-event simulation reproduces the 
quantum mechanical probabilities. Of course, this single 
snapshot is not a proof that the event-by-event simula- 
tion also works for other choices of the delays. 

An event-by-event simulation correctly reproduces the 
quantum mechanical probabilities if and only if Nj/N ~ 
\bj\ 2 for j = 0, 1, 2, 3, for any choice of the delays ( pha ses) 

ÉVery extensive tests, reported elsewhere [ÏJ, [lj, [1(1 
demonstrate that DLM-networks accurately repro- 
duce the probabilities of the quantum theory. 

In the event-by-event simulation, interference is a di- 
rect result of the learning process that takes place in each 
DLM. In the case at hand, the three (identical) beam 
splitters contain the learning machines. We emphasize 
that there is no direct communication between the dif- 
ferent beam splitters. All the information is carried by 
the particle while it is routed through the network. This 
is essential for the simulation to satisfy the physical cri- 
terion of causality. 



V. SUMMARY 

In this paper we ask ourselves the question what the 
optimal design of a processor, which can process and 
count incoming individual objects carrying information 
represented by an angle "0 but which cannot measure ip 
directly, would be if it has to give the most accurate esti- 
mate of the angle ip. In other words, how can we simulate 
the operation of a photon polarizer? 

First, we construct a processor operating according to 
the rules of probability theory. This so-called probabilis- 
tic processor uses random numbers to transform the in- 
coming angle ip, that is the information carried by the 
incoming single objects, into a sequence of discrete out- 
put events labeled by ±1. The numbers of +1 and — 1 
events only depend on the difference 9 — ip — (f> between 
the unknown angle ip and the orientation <f> of the pro- 
cessor. We design the probabilistic processor such that 
the result of the transformation process is probabilistic 
(Bernoulli trials), rotational invariant and satisfies the 
criteria 1 and 2 of SectionQ] For fixed cf> and N incoming 
objects, the observer, using the probabilistic processor to 
measure ip as accurate as possible, will get most out of 
the data if the processor sends JVcos 2 9 (Asin 2 9) events 
to the apparatus that detects the +1 (—1) events. The 
number of angles 9 that the observer can distinguish is 
proportional to y/N . The probabilistic processor is thus a 
model for an ideal polarizer. It produces data that agrees 
with Malus ' law. However, it is important to note that 
to obtain this result we do not use any law of physics in 
the design of the processor. We do not use the probabil- 
ity distributions derived in quantum theory to generate 
the ±1 events but we design the probabilistic processor 
in such a way that these probability distributions come 
out as a result of efficient processing of incoming data by 



the processor. Hence, we can ask the following important 
question. Can also other quantum phenomena appear as 
a result of efficient data processing? 

In order to answer this question we follow another 
route. Although the Bernoulli type probabilistic proces- 
sor can simulate the classical and quantum properties of 
a photon polarizer, it cannot simulate interference phe- 
nomena. To overcome this problem we could design a 
probabilistic processor that does not generate Bernoulli 
events but correlated output events. However, we choose 
to design processors that use a deterministic algorithm 
with a primitive learning capability to transform the in- 
coming events into a sequence of discrete output events. 
This type of processors we call deterministic processors 
or deterministic learning machines. 

Therefore, as a second step, we construct a determin- 
istic processor that models a photon polarizer, that is 
a deterministic processor that generates output events 
according to Malus' law. Just as the probabilistic pro- 
cessor, the deterministic processor has one input channel 
and two output channels labeled by +1 and —1, respec- 
tively. Apart from this the deterministic processor also 
has an infernal vector with two real entries. The input 
messages to the deterministic processor are unit vectors 
y„+i = (cosé' Il +i,siní9 Il+ i) for n = 0, . . . , N and where 
On = i'n — 4>- The deterministic processor learns from 
the input events by updating its intcrnal vector and uses 
this intcrnal vector in a complctely deterministic decision 
process to send out a +1 or a — 1 event. Hence, the order 
in which the +1 and —1 events are sent is deterministic. 
Apart from being deterministic, the result of the trans- 
formation process is rotational invariant and satisfies cri- 
teria 1 and 2 of Section which are exactly the same 
requirements as the ones used to construct the proba- 
bilistic processor. Further analysis of the output events 
shows that the number of +1 and —1 output events agrees 
with Malus' law. Hence, the photon polarizer can also be 
modelled by a deterministic processor. As in the case of 
the probabilistic processor, also in this case we did not 
use any laws of physics to design the processor. The 
number of angles 9 that the observer, using the deter- 
ministic processor to measure tf>, can distinguish is equal 
to N + 1. Hence, in this respect the deterministic pro- 
cessor performs much better than the probabilistic one. 
However, the more important and fundamental difference 
between the probabilistic and the deterministic processor 
is that the latter has a learning capability. Learning is 
an essential ingredient to simulate interference phenom- 
ena since it correlates the output events. As an example 
we show the event-by-event simulation of photons routed 
through two chained Mach-Zehnder interferometers by 
using a network of deterministic processors. We show 
that the quantum mechanical probabilities are also re- 
produced for this interference experiment. 

In conclusion, processors that efficiently process incom- 
ing data in the form of single events can simulate some 
quantum phenomema, such as the recovery of Malus' law 
for a photon polarizer. However, in order to simulate 
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quantum interference the processor should in addition 
have the capability of learning. Most importantly, the 
present work demonstrates that viewing quantum sys- 
tems as efhcient data processors provides a framework 
to construct adaptive, dynamical systems that can sim- 
ulate quantum interference on an event-by-event basis, 
without using concepts of quantum theory. 
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ments 10]. From probability theory it is well known 
that the Cramér-Rao inequality sets a lower bound to the 
variance of an estimator [ÜIÏÏE3- Here we prové that, 
within the limitations set by our design criteria, the esti- 
mation procedure is cfEcicnt in the sense that it satisfi.es 
the Cramér-Rao inequality with equality 0, llflj ] and that 
this inequality reduces to a trivial identity that contains 
no information For convenience of the reader, we re- 
peat the derivation of the Cramér-Rao inequality for the 
case of interest. 
Writing Eq.J2l as 

£ (x f(8))p(x\9) = 0, (Al) 

x=±l 



APPENDIX A: ON THE USE OF THE 
CRAMÉR-RAO INEQUALITY 

In Frieden's approach the Cramér-Rao inequality plays 
a central role to motivate the use of the Fisher infor- 
mation as a measure of the expected error in measure- 



and taking the derivative with respect to 9 we obtain 

dp(x\9) _ df(9) 



£>-/(*))- 



x=±l 



00 



de 



(A2) 



Rewriting Eq. (|A2jl as 



£ [(x-f(8))VK3 



x=±l 



1 dp(x\9) 



df{9) 
89 ' 



(A3) 



and using the Schwartz inequality gives the Cramér-Rao inequality 



v x=±l 



dp(x\9) 
89 
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(A4) 



where 



i F = y -±- 

x ^p{x\9) 



8p(x\9) 



(A5) 



is the Fisher information 0, IToL lïH ] . With the use of 
Eq.lJTJ we can write If as 



If = 



1 



p(í\9)(í - p(í\9)) 



d P (l\9) 
89 



(A6) 



that satisfies the bound in Eq. 1A4(I with equality is called 
efficient HD3- Using Eq. © and Var(x) 
Ap(l\9)(l - p(l\9j) wefind 



Var(a;)/ F = 4 



\d P (i\ey 


2 


\df(9)] 


89 




89 



(A7) 



Hence, the inequality Eq. 1A4JI reduces to a trivial iden- 
which is identical to Eq. (|l·lf> . Any estimation procedure tity from which we cannot deduce anything useful @. 
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